Overview

Dataset statistics

Number of variables14
Number of observations29674
Missing cells13
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.2 MiB
Average record size in memory112.0 B

Variable types

NUM12
BOOL2

Reproduction

Analysis started2020-05-24 15:33:18.415132
Analysis finished2020-05-24 15:34:33.208095
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
AGE is highly correlated with AGE_GROUPHigh Correlation
AGE_GROUP is highly correlated with AGEHigh Correlation
LIMIT_BAL is highly correlated with LIMIT_BAL_GROUPHigh Correlation
LIMIT_BAL_GROUP is highly correlated with LIMIT_BALHigh Correlation
PAY_AMT2 is highly skewed (γ1 = 30.31213063) Skewed
BILL_AMT5 has 3211 (10.8%) zeros Zeros
PAY_AMT1 has 4931 (16.6%) zeros Zeros
PAY_AMT2 has 5075 (17.1%) zeros Zeros
PAY_AMT3 has 5647 (19.0%) zeros Zeros
PAY_AMT4 has 6087 (20.5%) zeros Zeros
PAY_AMT5 has 6382 (21.5%) zeros Zeros
PAY_AMT6 has 6852 (23.1%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count29674
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14957.007514996292
Minimum0
Maximum30000
Zeros1
Zeros (%)< 0.1%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile1483.65
Q17451.25
median14936.5
Q322454.75
95-th percentile28490.35
Maximum30000
Range30000
Interquartile range (IQR)15003.5

Descriptive statistics

Standard deviation8662.794748
Coefficient of variation (CV)0.5791796747
Kurtosis-1.200223035
Mean14957.00751
Median Absolute Deviation (MAD)7502.011839
Skewness0.005745492805
Sum443834241
Variance75044012.84
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 30000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
11535 1 < 0.1%
 
23841 1 < 0.1%
 
17698 1 < 0.1%
 
19747 1 < 0.1%
 
29988 1 < 0.1%
 
25894 1 < 0.1%
 
27943 1 < 0.1%
 
5416 1 < 0.1%
 
7465 1 < 0.1%
 
Other values (29664) 29664 > 99.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
ValueCountFrequency (%) 
30000 1 < 0.1%
 
29999 1 < 0.1%
 
29998 1 < 0.1%
 
29997 1 < 0.1%
 
29996 1 < 0.1%
 

GRADUATE
Boolean

Distinct count2
Unique (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size232.0 KiB
0
19249
1
10424
(Missing)
 
1
ValueCountFrequency (%) 
0 19249 64.9%
 
1 10424 35.1%
 
(Missing) 1 < 0.1%
 

AGE_GROUP
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count8
Unique (%)< 0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3.4628113099450677
Minimum1.0
Maximum8.0
Zeros0
Zeros (%)0.0%
Memory size232.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile7
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.791121383
Coefficient of variation (CV)0.5172448692
Kurtosis-0.6465529551
Mean3.46281131
Median Absolute Deviation (MAD)1.510043384
Skewness0.5046278967
Sum102752
Variance3.208115809
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 7050 23.8%
 
3 5728 19.3%
 
4 4845 16.3%
 
1 3833 12.9%
 
5 3577 12.1%
 
6 2381 8.0%
 
7 1988 6.7%
 
8 271 0.9%
 
(Missing) 1 < 0.1%
 
ValueCountFrequency (%) 
1 3833 12.9%
 
2 7050 23.8%
 
3 5728 19.3%
 
4 4845 16.3%
 
5 3577 12.1%
 
ValueCountFrequency (%) 
8 271 0.9%
 
7 1988 6.7%
 
6 2381 8.0%
 
5 3577 12.1%
 
4 4845 16.3%
 

BILL_AMT5
Real number (ℝ)

ZEROS
Distinct count21010
Unique (%)70.8%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean40753.74441411384
Minimum-81334.0
Maximum927171.0
Zeros3211
Zeros (%)10.8%
Memory size232.0 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11981
median18413
Q350673
95-th percentile166749
Maximum927171
Range1008505
Interquartile range (IQR)48692

Descriptive statistics

Standard deviation60984.2006
Coefficient of variation (CV)1.496407299
Kurtosis12.20015426
Mean40753.74441
Median Absolute Deviation (MAD)41395.08864
Skewness2.862787937
Sum1209285858
Variance3719072723
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 3211 10.8%
 
390 233 0.8%
 
780 94 0.3%
 
316 79 0.3%
 
326 62 0.2%
 
150 57 0.2%
 
396 45 0.2%
 
416 36 0.1%
 
2500 34 0.1%
 
2400 31 0.1%
 
Other values (21000) 25791 86.9%
 
ValueCountFrequency (%) 
-81334 1 < 0.1%
 
-61372 1 < 0.1%
 
-53007 1 < 0.1%
 
-46627 1 < 0.1%
 
-37594 1 < 0.1%
 
ValueCountFrequency (%) 
927171 1 < 0.1%
 
823540 1 < 0.1%
 
587067 1 < 0.1%
 
551702 1 < 0.1%
 
547880 1 < 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS
Distinct count7943
Unique (%)26.8%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5725.735180130085
Minimum0.0
Maximum873552.0
Zeros4931
Zeros (%)16.6%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2157
Q35025
95-th percentile18610.2
Maximum873552
Range873552
Interquartile range (IQR)4025

Descriptive statistics

Standard deviation16643.64462
Coefficient of variation (CV)2.906813552
Kurtosis411.5383141
Mean5725.73518
Median Absolute Deviation (MAD)5959.618873
Skewness14.60546767
Sum169899740
Variance277010906.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 4931 16.6%
 
2000 1362 4.6%
 
3000 891 3.0%
 
5000 698 2.4%
 
1500 507 1.7%
 
4000 426 1.4%
 
10000 401 1.4%
 
1000 363 1.2%
 
2500 298 1.0%
 
6000 294 1.0%
 
Other values (7933) 19502 65.7%
 
ValueCountFrequency (%) 
0 4931 16.6%
 
1 9 < 0.1%
 
2 14 < 0.1%
 
3 15 0.1%
 
4 18 0.1%
 
ValueCountFrequency (%) 
873552 1 < 0.1%
 
505000 1 < 0.1%
 
493358 1 < 0.1%
 
423903 1 < 0.1%
 
405016 1 < 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count7899
Unique (%)26.6%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5986.2915782024065
Minimum0.0
Maximum1684259.0
Zeros5075
Zeros (%)17.1%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2037
Q35000
95-th percentile19233.6
Maximum1684259
Range1684259
Interquartile range (IQR)4000

Descriptive statistics

Standard deviation23159.08083
Coefficient of variation (CV)3.868685734
Kurtosis1625.729375
Mean5986.291578
Median Absolute Deviation (MAD)6522.828405
Skewness30.31213063
Sum177631230
Variance536343024.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 5075 17.1%
 
2000 1290 4.3%
 
3000 857 2.9%
 
5000 717 2.4%
 
1000 594 2.0%
 
1500 521 1.8%
 
4000 410 1.4%
 
10000 318 1.1%
 
6000 283 1.0%
 
2500 251 0.8%
 
Other values (7889) 19357 65.2%
 
ValueCountFrequency (%) 
0 5075 17.1%
 
1 15 0.1%
 
2 20 0.1%
 
3 18 0.1%
 
4 11 < 0.1%
 
ValueCountFrequency (%) 
1684259 1 < 0.1%
 
1227082 1 < 0.1%
 
1215471 1 < 0.1%
 
1024516 1 < 0.1%
 
580464 1 < 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS
Distinct count7518
Unique (%)25.3%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5283.145283591143
Minimum0.0
Maximum896040.0
Zeros5647
Zeros (%)19.0%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1444
median1880
Q34600
95-th percentile18000
Maximum896040
Range896040
Interquartile range (IQR)4156

Descriptive statistics

Standard deviation17695.15307
Coefficient of variation (CV)3.349359543
Kurtosis558.9924934
Mean5283.145284
Median Absolute Deviation (MAD)5907.410461
Skewness17.13796108
Sum156766770
Variance313118442.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 5647 19.0%
 
2000 1285 4.3%
 
1000 1103 3.7%
 
3000 870 2.9%
 
5000 721 2.4%
 
1500 490 1.7%
 
4000 381 1.3%
 
10000 312 1.1%
 
1200 243 0.8%
 
6000 241 0.8%
 
Other values (7508) 18380 61.9%
 
ValueCountFrequency (%) 
0 5647 19.0%
 
1 13 < 0.1%
 
2 19 0.1%
 
3 14 < 0.1%
 
4 15 0.1%
 
ValueCountFrequency (%) 
896040 1 < 0.1%
 
889043 1 < 0.1%
 
508229 1 < 0.1%
 
417588 1 < 0.1%
 
400972 1 < 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS
Distinct count6937
Unique (%)23.4%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean4879.136959525495
Minimum0.0
Maximum621000.0
Zeros6087
Zeros (%)20.5%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1324
median1502
Q34100
95-th percentile16361.2
Maximum621000
Range621000
Interquartile range (IQR)3776

Descriptive statistics

Standard deviation15744.04332
Coefficient of variation (CV)3.226809055
Kurtosis274.6837041
Mean4879.13696
Median Absolute Deviation (MAD)5569.795488
Skewness12.84474608
Sum144778631
Variance247874900.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 6087 20.5%
 
1000 1394 4.7%
 
2000 1214 4.1%
 
3000 887 3.0%
 
5000 810 2.7%
 
1500 441 1.5%
 
4000 402 1.4%
 
10000 341 1.1%
 
2500 259 0.9%
 
500 258 0.9%
 
Other values (6927) 17580 59.2%
 
ValueCountFrequency (%) 
0 6087 20.5%
 
1 22 0.1%
 
2 22 0.1%
 
3 13 < 0.1%
 
4 20 0.1%
 
ValueCountFrequency (%) 
621000 1 < 0.1%
 
528897 1 < 0.1%
 
497000 1 < 0.1%
 
432130 1 < 0.1%
 
400046 1 < 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS
Distinct count6897
Unique (%)23.2%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean4852.153607656793
Minimum0.0
Maximum426529.0
Zeros6382
Zeros (%)21.5%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1315
median1544
Q34100
95-th percentile16220.6
Maximum426529
Range426529
Interquartile range (IQR)3785

Descriptive statistics

Standard deviation15353.94255
Coefficient of variation (CV)3.164356241
Kurtosis178.3092098
Mean4852.153608
Median Absolute Deviation (MAD)5518.671954
Skewness11.07463591
Sum143977954
Variance235743551.9
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 6382 21.5%
 
1000 1340 4.5%
 
2000 1323 4.5%
 
3000 947 3.2%
 
5000 814 2.7%
 
1500 426 1.4%
 
4000 401 1.4%
 
10000 343 1.2%
 
500 250 0.8%
 
6000 247 0.8%
 
Other values (6887) 17200 58.0%
 
ValueCountFrequency (%) 
0 6382 21.5%
 
1 21 0.1%
 
2 13 < 0.1%
 
3 13 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
426529 1 < 0.1%
 
417990 1 < 0.1%
 
388071 1 < 0.1%
 
379267 1 < 0.1%
 
332000 1 < 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS
Distinct count6939
Unique (%)23.4%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5272.854177198126
Minimum0.0
Maximum528666.0
Zeros6852
Zeros (%)23.1%
Memory size232.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1200
median1500
Q34032
95-th percentile17543.6
Maximum528666
Range528666
Interquartile range (IQR)3832

Descriptive statistics

Standard deviation17866.70954
Coefficient of variation (CV)3.388432325
Kurtosis165.4893385
Mean5272.854177
Median Absolute Deviation (MAD)6246.817918
Skewness10.58823937
Sum156461402
Variance319219309.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 6852 23.1%
 
1000 1299 4.4%
 
2000 1295 4.4%
 
3000 914 3.1%
 
5000 808 2.7%
 
1500 439 1.5%
 
4000 411 1.4%
 
10000 356 1.2%
 
500 247 0.8%
 
6000 220 0.7%
 
Other values (6929) 16832 56.7%
 
ValueCountFrequency (%) 
0 6852 23.1%
 
1 20 0.1%
 
2 9 < 0.1%
 
3 14 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
528666 1 < 0.1%
 
527143 1 < 0.1%
 
443001 1 < 0.1%
 
422000 1 < 0.1%
 
403500 1 < 0.1%
 

LIMIT_BAL_GROUP
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count7
Unique (%)< 0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3.2573046203619453
Minimum1.0
Maximum7.0
Zeros0
Zeros (%)0.0%
Memory size232.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.827953897
Coefficient of variation (CV)0.5611860449
Kurtosis-1.36848069
Mean3.25730462
Median Absolute Deviation (MAD)1.62979723
Skewness0.1682842791
Sum96654
Variance3.34141545
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 7607 25.6%
 
5 4997 16.8%
 
2 4784 16.1%
 
6 4300 14.5%
 
4 3915 13.2%
 
3 3864 13.0%
 
7 206 0.7%
 
(Missing) 1 < 0.1%
 
ValueCountFrequency (%) 
1 7607 25.6%
 
2 4784 16.1%
 
3 3864 13.0%
 
4 3915 13.2%
 
5 4997 16.8%
 
ValueCountFrequency (%) 
7 206 0.7%
 
6 4300 14.5%
 
5 4997 16.8%
 
4 3915 13.2%
 
3 3864 13.0%
 

default
Boolean

Distinct count2
Unique (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size232.0 KiB
0
23121
1
6552
(Missing)
 
1
ValueCountFrequency (%) 
0 23121 77.9%
 
1 6552 22.1%
 
(Missing) 1 < 0.1%
 

AGE
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count56
Unique (%)0.2%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean35.501870387220706
Minimum21.0
Maximum79.0
Zeros0
Zeros (%)0.0%
Memory size232.0 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.232184001
Coefficient of variation (CV)0.26004782
Kurtosis0.03836190383
Mean35.50187039
Median Absolute Deviation (MAD)7.560042396
Skewness0.7305412748
Sum1053447
Variance85.23322143
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
29 1575 5.3%
 
27 1463 4.9%
 
28 1387 4.7%
 
30 1382 4.7%
 
26 1243 4.2%
 
31 1200 4.0%
 
25 1176 4.0%
 
34 1146 3.9%
 
32 1144 3.9%
 
33 1137 3.8%
 
Other values (46) 16820 56.7%
 
ValueCountFrequency (%) 
21 67 0.2%
 
22 556 1.9%
 
23 916 3.1%
 
24 1118 3.8%
 
25 1176 4.0%
 
ValueCountFrequency (%) 
79 1 < 0.1%
 
75 3 < 0.1%
 
74 1 < 0.1%
 
73 4 < 0.1%
 
72 3 < 0.1%
 

LIMIT_BAL
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count81
Unique (%)0.3%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean167330.8961008324
Minimum10000.0
Maximum1000000.0
Zeros0
Zeros (%)0.0%
Memory size232.0 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129876.6449
Coefficient of variation (CV)0.7761665531
Kurtosis0.5456392649
Mean167330.8961
Median Absolute Deviation (MAD)105037.984
Skewness0.9974503939
Sum4965209680
Variance1.686794288e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50000 3327 11.2%
 
20000 1963 6.6%
 
30000 1597 5.4%
 
80000 1544 5.2%
 
200000 1495 5.0%
 
150000 1096 3.7%
 
100000 1040 3.5%
 
180000 980 3.3%
 
360000 833 2.8%
 
60000 825 2.8%
 
Other values (71) 14973 50.5%
 
ValueCountFrequency (%) 
10000 488 1.6%
 
16000 2 < 0.1%
 
20000 1963 6.6%
 
30000 1597 5.4%
 
40000 230 0.8%
 
ValueCountFrequency (%) 
1000000 1 < 0.1%
 
800000 2 < 0.1%
 
780000 2 < 0.1%
 
760000 1 < 0.1%
 
750000 4 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexGRADUATEAGE_GROUPBILL_AMT5PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6LIMIT_BAL_GROUPdefaultAGELIMIT_BAL
000.01.00.00.0689.00.00.00.00.01.01.024.020000.0
110.02.03455.00.01000.01000.01000.00.02000.03.01.026.0120000.0
220.03.014948.01518.01500.01000.01000.01000.05000.02.00.034.090000.0
330.04.028959.02000.02019.01200.01100.01069.01000.01.00.037.050000.0
440.07.019146.02000.036681.010000.09000.0689.0679.01.00.057.050000.0
551.04.019619.02500.01815.0657.01000.01000.0800.01.00.037.050000.0
661.02.0483003.055000.040000.038000.020239.013750.013770.06.00.029.0500000.0
770.01.0-159.0380.0601.00.0581.01687.01542.02.00.023.0100000.0
880.02.011793.03329.00.0432.01000.01000.01000.03.00.028.0140000.0
990.03.013007.00.00.00.013007.01122.00.01.00.035.020000.0

Last rows

df_indexGRADUATEAGE_GROUPBILL_AMT5PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6LIMIT_BAL_GROUPdefaultAGELIMIT_BAL
29664299910.03.02500.00.00.00.00.00.00.05.01.034.0210000.0
29665299920.05.00.02000.00.00.00.00.00.01.00.043.010000.0
29666299931.04.069473.02000.0111784.04000.03000.02000.02000.02.00.038.0100000.0
29667299940.03.082607.07000.03500.00.07000.00.04000.02.01.034.080000.0
29668299950.04.031237.08500.020000.05003.03047.05000.01000.05.00.039.0220000.0
29669299960.05.05190.01837.03526.08998.0129.00.00.03.00.043.0150000.0
29670299970.04.020582.00.00.022000.04200.02000.03100.01.01.037.030000.0
29671299980.05.011855.085900.03409.01178.01926.052964.01804.02.01.041.080000.0
29672299990.06.032428.02078.01800.01430.01000.01000.01000.01.01.046.050000.0
2967330000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN